AITopics | edit type

Collaborating Authors

edit type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Qian, Yusu, Bocek-Rivele, Eli, Song, Liangchen, Tong, Jialing, Yang, Yinfei, Lu, Jiasen, Hu, Wenze, Gan, Zhe

arXiv.org Artificial IntelligenceOct-23-2025

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality scoring and careful curation. Beyond single turn editing, Pico-Banana-400K enables research into complex editing scenarios. The dataset includes three specialized subsets: (1) a 72K-example multi-turn collection for studying sequential editing, reasoning, and planning across consecutive modifications; (2) a 56K-example preference subset for alignment research and reward model training; and (3) paired long-short editing instructions for developing instruction rewriting and summarization capabilities. By providing this large-scale, high-quality, and task-rich resource, Pico-Banana-400K establishes a robust foundation for training and benchmarking the next generation of text-guided image editing models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.19808

Genre: Research Report (0.53)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Learning an Image Editing Model without Image Editing Pairs

Kumari, Nupur, Wang, Sheng-Yu, Zhao, Nanxuan, Nitzan, Yotam, Li, Yuheng, Singh, Krishna Kumar, Zhang, Richard, Shechtman, Eli, Zhu, Jun-Yan, Huang, Xun

arXiv.org Artificial IntelligenceOct-17-2025

Recent image editing models have achieved impressive results while following natural language editing instructions, but they rely on supervised fine-tuning with large datasets of input-target pairs. This is a critical bottleneck, as such naturally occurring pairs are hard to curate at scale. Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models. However, this can propagate and magnify the artifacts of the pretrained model into the final trained model. In this work, we present a new training paradigm that eliminates the need for paired data entirely. Our approach directly optimizes a few-step diffusion model by unrolling it during training and leveraging feedback from vision-language models (VLMs). For each input and editing instruction, the VLM evaluates if an edit follows the instruction and preserves unchanged content, providing direct gradients for end-to-end optimization. To ensure visual fidelity, we incorporate distribution matching loss (DMD), which constrains generated images to remain within the image manifold learned by pretrained models. We evaluate our method on standard benchmarks and include an extensive ablation study. Without any paired data, our method performs on par with various image editing diffusion models trained on extensive supervised paired data, under the few-step setting. Given the same VLM as the reward model, we also outperform RL-based techniques like Flow-GRPO.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.14978

Country: Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (1.00)

Industry: Media (0.58)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM

Stefanache, Stefan, Pérez, Lluís Pastor, Watanabe, Julen Costa, Tejedor, Ernesto Sanchez, Hofmann, Thomas, Simsar, Enis

arXiv.org Artificial IntelligenceOct-8-2024

Evaluating diffusion-based image-editing models is a crucial task in the field of Generative AI. Specifically, it is imperative to assess their capacity to execute diverse editing tasks while preserving the image content and realism. While recent developments in generative models have opened up previously unheard-of possibilities for image editing, conducting a thorough evaluation of these models remains a challenging and open task. The absence of a standardized evaluation benchmark, primarily due to the inherent need for a post-edit reference image for evaluation, further complicates this issue. Currently, evaluations often rely on established models such as CLIP or require human intervention for a comprehensive understanding of the performance of these image editing models. Our benchmark, PixLens, provides a comprehensive evaluation of both edit quality and latent representation disentanglement, contributing to the advancement and refinement of existing methodologies in the field.

category, edit type, evaluation, (15 more...)

arXiv.org Artificial Intelligence

2410.0571

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting

Du, Haowei, Zhang, Dinghao, Li, Chen, Li, Yang, Zhao, Dongyan

arXiv.org Artificial IntelligenceJan-8-2024

Recent approaches in Incomplete Utterance Rewriting (IUR) fail to capture the source of important words, which is crucial to edit the incomplete utterance, and introduce words from irrelevant utterances. We propose a novel and effective multi-task information interaction framework including context selection, edit matrix construction, and relevance merging to capture the multi-granularity of semantic information. Benefiting from fetching the relevant utterance and figuring out the important words, our approach outperforms existing state-of-the-art models on two benchmark datasets Restoration-200K and CANAND in this field. Code will be provided on \url{https://github.com/yanmenxue/QR}.

denote, incomplete utterance, utterance, (13 more...)

arXiv.org Artificial Intelligence

2312.11945

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)

Add feedback

Motion-Conditioned Image Animation for Video Editing

Yan, Wilson, Brown, Andrew, Abbeel, Pieter, Girdhar, Rohit, Azadi, Samaneh

arXiv.org Artificial IntelligenceNov-30-2023

Recent advancements in image and video generation models have seen tremendous progress, with existing models able to synthesize highly complex images [26, 27, 28, 30, 6] or videos [37, 31, 2, 15, 12] given textual descriptions. Outside of generating purely novel content, these models have shown to be powerful tools in achieving advanced image and video editing capabilities for downstream content creation. Given a source video, a caption of the source video, and an editing textual prompt, a video editing method should produce a new video that is aligned with the provided editing prompt while retaining faithfulness to all other non-edited characteristics of the original source video. Video edit types can be broadly split into two main categories of spatial and temporal edits. Spatial edits generally consist of image-based edits extended to video, such as editing a video in the style of Van Gogh, inserting an object into the scene, or changing the background. Due to the added temporal dimension in video, we can also change the underlying motion of the object, such as making a panda play in a pile of ribbons, or replacing apricots in a video with apples and making them fall off a tree (see Figure 1).

arxiv preprint arxiv, source video, video, (14 more...)

arXiv.org Artificial Intelligence

2311.18827

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

Heineman, David, Dou, Yao, Maddela, Mounica, Xu, Wei

arXiv.org Artificial IntelligenceOct-22-2023

Large language models (e.g., GPT-4) are uniquely capable of producing highly rated text simplification, yet current human evaluation methods fail to provide a clear understanding of systems' specific strengths and weaknesses. To address this limitation, we introduce SALSA, an edit-based human annotation framework that enables holistic and fine-grained text simplification evaluation. We develop twenty one linguistically grounded edit types, covering the full spectrum of success and failure across dimensions of conceptual, syntactic and lexical simplicity. Using SALSA, we collect 19K edit annotations on 840 simplifications, revealing discrepancies in the distribution of simplification strategies performed by fine-tuned models, prompted LLMs and humans, and find GPT-3.5 performs more quality edits than humans, but still exhibits frequent errors. Using our fine-grained annotations, we develop LENS-SALSA, a reference-free automatic simplification metric, trained to predict sentence- and word-level quality simultaneously. Additionally, we introduce word-level quality estimation for simplification and report promising baseline results. Our data, new metric, and annotation toolkit are available at https://salsa-eval.com.

computational linguistic, information, simplification, (13 more...)

arXiv.org Artificial Intelligence

2305.14458

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(19 more...)

Genre: Research Report (1.00)

Industry:

Law (0.67)
Government (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

Gao, Yanjun, Huang, Ting-hao, Passonneau, Rebecca J.

arXiv.org Artificial IntelligenceJun-22-2021

Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2106.12027

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.75)

Add feedback

A Comprehensive Trainable Error Model for Sung Music Queries

Birmingham, W. P., Meek, C. J.

arXiv.org Artificial IntelligenceJun-30-2011

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1334

1107.0054

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > New Zealand (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Comprehensive Trainable Error Model for Sung Music Queries

Meek, C. J., Birmingham, W. P.

Journal of Artificial Intelligence ResearchAug-1-2004

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of `query-by-humming' (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of {m error} or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.

comprehensive trainable error model, probability, query, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1334

AI Access Foundation

10385

Journal of Artificial Intelligence Research

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > New Zealand (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback